Picture for Hang Xu

Hang Xu

CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback

Add code
Apr 28, 2025
Viaarxiv icon

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning

Add code
Apr 08, 2025
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Viaarxiv icon

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Add code
Mar 29, 2025
Viaarxiv icon

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation

Add code
Mar 27, 2025
Viaarxiv icon

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Add code
Mar 20, 2025
Viaarxiv icon

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Add code
Mar 16, 2025
Viaarxiv icon

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Add code
Mar 12, 2025
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon